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1. Introduction 

In this paper we discuss the use of sequential Monte Carlo (SMC) methods (al- 
ternatively termed particle methods) for likelihood-based inference in partially 
observed diffusions (PODs). The proposed method relies on a novel approach 
for estimating transition densities of diffusion processes via so-called generalised 
poisson estimators (GPEs). For the models under consideration, the likelihood 
function of the observed data cannot be expressed on closed- form; however, since 
partially observed diffusion models are, like more general latent variable models, 
specified using conditional dependence relations, this inference problem can be 
efficiently cast into the framework of the expectation-maximisation (EM) algo- 
rithm proposed by Dempster ct al. (1977). When applying the EM algorithm 
in the POD context there are two main difficulties: firstly, in all except a few 
cases, the transition density of the diffusion process, and thus the complete 
data log-likelihood function, lacks an analytic expression; secondly, computing 
the intermediate quantity of the expectation-step involves taking expectations 
under the smoothing distribution, i.e. the conditional distribution of the hidden 
states at the observation time points given the observed data record, which is 
not — even in the case of a known transition density — available on closed-form. 
These two issues make, as documented by several authors, MLE-bascd inference 
in PODs very challenging. In this paper we address these problems by applying 
the GPE suggested (as a refinement of results obtained in Bcskos ct al., 2006) 
by Fearnhead et al. (2008) in conjunction with SMC smoothing algorithms. Un- 
fortunately, it has been observed by several authors that using standard SMC 
methods in the smoothing mode may be unreliable for larger observation sam- 
ple sizes n, since resampling systematically the particles leads to degeneracy 
of the particle paths. As a solution, we adapt the fixed-lag smoother proposed 
by Olsson et al. (2008) to the framework of PODs. This technique relies, in 
the spirit of Kitigawa (1998), on forgetting properties of the conditional hidden 
chain; by this is meant that the hidden chain forgets its past when evolving, 
backwards as well as forwards, conditionally on the given observation sequence. 
The constructed algorithm avoids efficiently particle trajectory degeneracy at 
the cost of a bias which can however be controlled by a suitable choice of the 
introduced lag parameter. 

In order to obtain a high performance of the particle smoother it is in general 
necessary to propose (mutate) the particles according a kernel that takes the 
information provided by the current observation into account; indeed, mutat- 
ing, as in the bootstrap particle filter, the particles "blindly" according to the 
dynamics of the hidden Markov chain will often lead to severe degeneracy of 
the particle importance weights. However, such an improved proposal strategy 
is not straightforwardly adopted to PODs, since computing the resulting impor- 
tance weights involves computing a ratio of the transition density of the hidden 
diffusion process (for which a closed-form expression is missing in general) and 
that of the chosen proposal kernel. To cope with this, we follow Fearnhead et al. 
(2008) and replace each evaluation of the hidden process transition density by 
a draw from the GPE. Thus, the GPE serves two purposes in our algorithm 
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as it is used, firstly, for computing unbiased estimates of particle importance 
weights for a particle filter based on a proposal kernel different from the transi- 
tion kernel of the hidden diffusion process and, secondly, for estimating the EM 
intermediate quantity itself. 

The contribution of our study is fourfold, since the proposed intermediate 
quantity estimator 

1. approximates efficiently the expectation step in a single sweep of the data 
record, yielding an algorithm with a computational complexity of order 
0(nN); 

2. copes, as it is not based on any Euler discretisation or linearisation tech- 
nique, efficiently with model nonlinearities; 

3. has only limited computer data storage requirements, which is essential in, 
e.g., high frequency applications where sometimes very long measurement 
sequences are considered; 

4. is provided with a rigorous convergence result describing its convergence 
to the true intermediate quantity. This result is derived via a convergence 
result, obtained under minimal assumptions, for the GPE-bascd particle 
smoother. 

For models exhibiting poor mixing properties, in which case we cannot ex- 
pect a high performance of the fixed-lag smoother, we propose an alternative 
algorithm where the GPE is used in conjunction with the particle-based forward- 
filtering backward- smoothing procedure proposed by Godsill et al. (2004). This 
scheme, which relies on a decomposition of the smoothing measure that incor- 
porates the so-called backward kernels (i.e. the transition kernels of the hidden 
Markov chain when evolving backwards in time and conditionally on the ob- 
servations) of the model, avoids particle path degeneracy completely through 
an additional simulation pass in the time-reversed direction. Moreover, it does 
not suffer from the additional, model dependent bias of the fixed-lag smoother. 
However, these appealing properties are obtained at the cost of a significant 
increase of computational work, since the complexity of the scheme in question 
is quadratic in the number of particles. 

The paper is organised as follows: In Section 2 we recall the concept of PODs 
and discuss likelihood-based inference in such models via data augmentation 
and the EM-algorithm. GPEs are described in Section 2.1 and Section B, and 
Section 2.2 is devoted to SMC smoothing in general. In Sections 2.3 and 2.4 we 
introduce the fixed-lag smoother and the forward-filtering backward-simulation 
smoother, respectively; moreover, we discuss how these techniques can be ad- 
justed to PODs using GPEs. A theoretical result describing the convergence 
of the fixed-lag-based estimator is found in Section 2.3.1, and in Section 3 we 
illustrate the method on partially observed log-growth and genetics diffusion 
models. In Section 4, the paper is concluded by some final conclusions and 
remarks. Proofs arc found in Section A. 
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2. Preliminaries 

In the following wc assume that all random variables arc defined on a common 
probability space (Q, J-, P) and let E denote expectations associated with P. 
Denoting by 1 the indicator function and letting X be any random variable on 
(f2,.F), we will often make use of the short-hand notation E[X;A] = EfXl^]. 
Let X d = (X t ) t > be continuous-time diffusion process taking values in some 
space (X,X), with X C M. dx . More specifically, the dynamics of the process is 
governed by the the stochastic differential equation 



where W = (W t ) t > is Brownian motion. We denote by W*-^ the law of W 
given that Wq = x and let {Tt)o<t be the filtration generated by W. The 
functions n(-,9) and o~(-,9) are assumed to satisfy regularity conditions (locally 
Lipschitz with a linear growth bound) that guarantee a weakly unique, global 
solution of (2.1). We will consider a framework where the process X is only 

partially observed at discrete time points (tk)k>o through the process Y = f 
(Yk)k>o taking values in some measurable space (Y, y). The observations of Y 
are assumed to be, conditionally on the latent process X, independent and such 
that the conditional distribution Ge of Y k given X depends on Xt k only. In 
the following we write, in order to simplify the notation, Xk instead of Xt k . 
The dynamics of the diffusion as well as the measurement process depend on 
some unknown model parameter 9 which is assumed to belong to some compact 
parameter space O C M. de . Our main target is to estimate 9 using the maximum 
likelihood method. For simplicity we assume that the observation time points 
are equally spaced and denote by Qg and x the transition kernel and initial 
distribution, respectively, of the time homogeneous Markov chain (X k ) k >o. The 
family (Qg(x, -);x G X, 9 G 0) is dominated by the Lebesque-measure A with 
corresponding Radon-Nikodym derivatives (qg(x,-);x G X, 9 G 0). Moreover, 
suppose that Ge has a density function ge with respect to some measure /i on 



Given a record Yg [n = (Yq, Y\, . . . ,Y„) (similar vector notation will be used also 
for other quantites) of observations, a consistent estimate of the parameter 9 is 

ideally formed by maximising the observed data likelihood function l n (9\ Yo-.n) = f 
logL„(0; F :n), where 



dX t = n(X t ,0) dt + <i(X t: 9) dW t 



(2.1) 



(Y,y) such that, for k > 0, 





n 



g e (x ,Y ) x(dx ) Y\_9e(x k ,Y k )Qe(x k -i,dx k ) , 



k=l 



A problem with this approach is that we in general cannot compute L„ on 
closed-form, since this involves the evaluation of a high-dimensional integral 
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over a complicated integrand. Since the partially observed diffusion model above 
is, like more general latent variable models, specified using conditional depen- 
dence relations, computation of parameter posterior distributions is facilitated 
significantly by maximising instead the complete data log-likelihood function by 
means of the EM algorithm: Assume that we have at hand an initial estimate 9' 
of the parameter vector. In the EM algorithm an improved estimate is obtained 
by computing and maximising the intermediate quantity Q(9; ■) defined by 



?') d ^ E e 



n-l 



^2\ogq e (X k ,X k+1 



k=0 



k=0 



logg e (X k ,Y k ) 



Yo-.r 



(2.2) 

Here we have written Eg/ to stress that the expectations are taken under the 
dynamics determined by the initial parameter 9' . Under weak assumptions, re- 
peating recursively this procedure yields a sequence of parameter estimates that 
converges to a stationary point 9* of the observed data log-likelihood (Wu, 1983). 
As clear from (2.2), computing Q„ requires the computation of expected values 
under the smoothing distribution, i.e. the distribution of the state sequence X 0:n 
conditionally on the observations Yo :n , given by, for A e X^ n+1 \ 



MA; 9) 



def / ••• J A ge(x , Y ) x(dz ) IlLi 9e(xk,Y k ) Qe(x k ^i,dx k ) 



(2.3) 



Of special interest is the filter distribution, i.e. the distribution of X n condition- 

dcf 

ally on Yo :n , given by the restriction <f> n \ n (A) = </> n (X™ x A), A £ X, of the 
smoothing distribution to the last component. It is easily shown that the flow 
{4>k)kLo sa ti sncs the well-known forward smoothing recursion 



k+ i(A;9) = 



U(9;Y .. k ) 

Yo-.k+l) 



ge (xfc+i, ife+i) Qe {xk , dx k+1 ) (fi k (dx 0:k ; 9) 



where A £ x®( k + 2 ) . By introducing the (non-Markovian) transition kernel 
L k {x kl A; 6) d = / ge(x k+ i,Y k+1 )Q d (x kl dx k+1 ) 

J A 

for x k € X and A 6 X, we may rewrite the recursion (2.4) as 
fe+ i(A;0) = 



(2.4) 



/ J A L k (x k , dx k+ i ;6)4>k {dxo-.k ; 6) 
JJ L k (x k ,dx k+1 ;9)(/) k (dxo.. k ;9) 



(2.5) 



Here the normalised (Markovian) kernel L k (x k , A; 9)/L k (x, X; 9) is the so-called 
optimal kernel describing the distribution of X k +\ given X k = x k and the new 
observation Y k +\. 

In general, a closed-form solution of the recursion (2.4) is not available. 
A standard approach is thus to apply some SMC smoothing algorithm (de- 
scribed in in Section 2.2) to approximate the expectations in (2.2). Unfor- 
tunately, both the SMC smoother itself as well as the intermediate quantity 



J. Olsson and J. Strdjby/Particle-based inference in partially observed diffusions 



6 



(2.2) call for the transition density qe, which is usually unknown except in a 
few special cases. Nevertheless, results obtained by Beskos et al. (2006) and 
Fearnhead et al. (2008) offer a method for estimating this density without bias. 
A full treatment of this technique — which is a key ingredient of the estimation 
technique proposed here — is beyond the scope of this paper; nevertheless, the 
main framework and assumptions are described briefly in the next section. In 
addition, some more details can be found in Appendix B. 



r](-,6):ut-> I — - — — dv 



2.1. Generalised Poisson estimators 

Define the function 

r 

n(- 0) ■ ii i — y / 

a(v,9) 

and set X t d = rj(X t , 9). Denote by the inverse of any invertable function /. 
By applying Ito's formula we obtain the stochastic differential equation 

dX t = a{X t ,9)dt + dW t , (2.6) 

where 

f m def n{rf^{u,e),e} i , 

= v{r,^(u,6),6} + r {T] ( "'^ } ' 

for the transformed process X d = (X t )t>o- Using again the notation Xk = X tkl 
let qe be the transition density (with respect to the Lebesgue measure A) of 
{Xk)k>o- Then, straightforwardly, 

qe(x,x') = qe(x,x')\r)'(x',6)\ . (2.7) 

Assume the following: 

(Al) The process {M t )t>o, with 

M t d = exp Qf a(X s ,9) dX s + J a 2 {X s ,6) ds 

is a martingale with respect to ; 
(A2) a(-,9) is continuously differentiable; 

(A3) a 2 {-,6) + a'(-,6) is bounded from below by some function 1(9). 

Under these conditions, the GPE approach developed by Fearnhead et al. 
(2008) makes it possible to generate random variables Vg(x, x') with KVg(x, x') = 
q~e(x,x') for any (x,x ! ) S X 2 , i.e. V$(x, x') estimates the transition density qg 
without any bias, for a large class of diffusions of type (2.6). Then, letting 

V e (x,x') d = V e (x,x')\r)'(x',9)\ yields, using (2.7), EV e (x,x') = qe{x,x'). A full 
description of GPEs is beyond the scope of this paper; however, its main features 
are discussed in Appendix B. In this paper we represent the GPE by a kernel Pg 
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in sense that V g (x,x') ~ Pg(x,x', •). Similarly, using the related exact algorithm 
developed by Beskos et al. (2006), it is possible to construct a kernel Pg such 
that EV e (x, x', 6) = \ogqe{x, x') for draws V g (x, x', 9) ~ Pg(x,x', •). Appealingly, 
it is in many cases (see Section 3 for examples) possible to construct Pg and Pg 
such that the functions 9 t— > V g (x,x')(uj) and 9 >-> V s (x, x')(uj) are continuous 
for any fixed outcome w e fl, yielding unbiased estimates of qg and log qg for 
all 9 S 6 simultaneously. This useful property makes, as we will see, the GPE 
approach well suited to numerical (log-)likclihood function optimisation. 



2.2. GPE-based particle smoothing 



Since we in this part deal with the problem of sampling cj) k (-;9) for a given fixed 
parameter value, we will throughout this section expunge 9 from the notation. 
To begin with, we assume that we know the transition kernel density q. 

In order to describe precisely how SMC methods may be used for producing 
approximate solutions to the smoothing recursion (2.4), we suppose that we are 
given a weighted sample (<^o- fc| ' w fe)^=i °f particle and associated weights, each 
particle Co- k \ k = (£!|fc> • • * ' £fc|fc) being a random variable in X fc+1 , approximating 



in the sense that 



N 

<t>%(f) = (fif) _1 E^/(S#) « Mf) , (2-8) 

2=1 

where fl^ d = Y^,eLi W L f°r a large class of estimand functions / on X fc+1 . Now, 
in order to form an updated particle sample approximating 4>k+i, as a new 
observation Y k+ \ becomes available, a natural approach is to replace <p k in (2.5) 
by its particle approximation. This yields the mixture (recall the notation 5 a 
for a Dirac mass located at a) 

1N ,^def^ U'M&W*) f L k (C klk ,dx k+1 ) 



n+AA) = 2^ N £ t L x) ^ :fe|fc ( d ^0:fc) , 

i=\ 2^=1 w fc^felSfe|fe> A > JA ^ fe V?fe|fe>^ 

for A <E < Y®( fe + 2 ), Now, the aim is to simulate a new set of particles from 
4> k+ i and repeat this recursively to obtain particle samples approximating the 
smoothing distributions at all time steps. However, since we in general cannot 
neither simulate draws from the optimal kernel nor compute the mixture weights 
Lfc(£ju fe ,X), we apply importance sampling and draw new particles from the 
instrumental mixture distribution 

"ftiW d " E I (e klk ,dx k+1 



for A £ X®( k+2 \ where R k is a Markovian proposal kernel and {ip k )iLi are 
positive numbers referred to as adjustment multiplier weights. We will from now 
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on assume that ip k = ^fcC^o-fcifc) f° r somc nonnegative function ^ k '■ X fc+1 — > R + 
and that each kernel R k has a density r k with respect to A. Simulating a particle 
£o-fc+i|fc+i fr° m ^jt+i i s easily done by, firstly, drawing, according to the proba- 
bility distribution proportional to (u> k ifj k )fLii a mixture component (or ancestor) 
index II among {1, . . . , N} and, secondly, extending the selected ancestor with 

a draw from the proposal kernel, i.e. letting £o-fc+i|fc+i = (^o-fclfc' £fc+i|fc+i) W1 th 

£fc+i|fc+i ~ -^ fc (^feffe' ')■ After this, the drawn particle is assigned the importance 
weight 

4+1 = (& fc+1 | fc+1 ) , (2-9) 

where, for xoifc+i € X fc+2 , 

rk{Xk, Xk+i) 

implying u k+1 oc ^h+i/^k+ii^o-k+ilk+i)' Finally, the weighted particle sample 
formed by the updated particles and weights is returned as an approximation 
of <f>k+i - Moreover, since the filter distribution is the marginal of the smoothing 
distribution with respect to the last component, an estimate of <j>k+i\k+i is 
formed by the marginal sample {Ck+uk+v^k+i^S^l' 

Proposing and selecting the particles according to the dynamics of the latent 
process, i.e. without making use of the information about the current state 
provided by the current observation, by letting R k = Q and ty k = 1 for all k, 
corresponds to the bootstrap particle filter proposed by Gordon et al. (1993). 

The algorithm, which was developed gradually by, mainly, Handschin and Mayne 
(1969), Gordon et al. (1993), and Pitt and Shephard (1999), will be referred to 
as the auxiliary particle smoother (APS). In the setting of a partially observed 
diffusion process we do not have access to a closed-form expression of the transi- 
tion density q, which is needed when evaluating the importance weight function 
. However, the GPE makes it possible to estimate this density without bias 
via the kernel P. This yields following algorithm, in following referred to as the 
GPE-based particle smoother (GPEPS), in which q in the weighting operation 
(2.9) is replaced by the Monte Carlo estimate 

= -iV^O, (2.10) 

i=i 

where the V (x, x')'s are drawn independently from P(x,x' , •). Denote by 

$ fc+1 (x 0: fc+l) = 0(Xfe + i,Yfc+i)# fe [Xo-.k)— -, (2.11) 

r k (Xk,Xk+i) 

the resulting estimated importance weight function. One iteration of the GPEPS 
is described in detail in the following scheme. 
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Algorithm 1 

(* One iteration of GPEPS *) 

Input: (4fc|fc><4)£i> R k, " 

1. for i <- 1 to N 

2. simulate I{ ~ (44/ £f=x ^Vi)f =1 5 

3. simulate ^ +1 , fe+1 ~ Rk(^ k , ■); 

4- Set £o:fc+l|fc+l ^~ (^0-fc|fc'^fc+l|fe+l)' 

5. simulate V Ua (? ksk+1 \ k+1 ) ~ ^"(4^+1^+1 = 

6. compute via (2.11); 
7- set^ +1 ^^ +1 (^ 

8. return (^ :fc+1 , fc+1 ,4+i)ili- 



Here we have used the notations V 1:a (x, x') = f (V 1 (x, x'), . . . , V a (x, x')) and 

P® a (x, x' , ■) d = P(x, x', •) <8> ■ ■ ■ <8> P{x, x' , ■) (a times). Algorithm 1 extends the 
random weight auxiliary particle filter proposed by Fearnhead et al. (2008) to 
the smoothing mode. Note that we have, in the scheme above, suppressed the 
dependence of the particles and the particle weights on a from the notation for 
clarity. 

In the selection operation of Step (2), each particle index is drawn from the 
probability distribution formed by the adjusted weights (44/ ^2e=i 44)f=i- 
Letting M k denote the number of times that index i was drawn, the selection 
operation may be alternatively expressed as 

There are however many alternative ways of performing selection; e.g., one may 
set Ml ^ \_N44/Y,tr44\ + H with 

(Hi, . . .,H k ) 

where \_x\ denotes the integer part of a real number x and (x) = x — [x\ . In 
this selection schedule, which was proposed by Liu and Chen (1995) under the 
name deterministic plus residual multinomial resampling, index i is first copied 

1 N 44/T,f=i44l times ; the remaining T,?=i( N 44/ J2e=i 44) indices 
are hereafter drawn multinomially with respect to weights proportional to the 
residuals ((Noj k tp k / Yle=i 44))i ! =i- AH theoretical results obtained in the fol- 
lowing will hold for both the selection schedules (2.12) and (2.13). In addition, 
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our results are easily extended to selection schemes based on Poisson, binomial, 
and Bernoulli branching (see Douc and Moulines, 2008, for a theoretical analy- 
sis of these algorithms); however, since the number of drawn indices are random 
in this case, we omit these results for brevity. 

2.2.1. Convergence of the GPEPS 

We will describe the convergence, as N tends to infinity, of the self-normalised 
Monte Carlo approximations formed by weighted particle samples returned by 
Algorithm 1 using the concept of consistency (adopted from Douc and Moulines, 
2008) defined in the following. Let (H, 23(E)) denote some given state space and 
(£iv,i, u>N,i)iLi a E-valued particle sample. 

Definition 2.1. A weighted sample (£j\r,ii ^>N,i)i—i is consistent for a probability 
measure [i and a set C C L (S,/i) if, as N — > oo, 

N 

fi^ 1 X>iV,i/tev,i) ^M/)> for all feC, (2.14) 

i=l 

and, additionally, 

5l7 r max ojn i — > . (2.15) 

JV l<i<N 

The following assumption is mild (in fact, minimal) but essential when es- 
tablishing consistency of the GPEPS scheme. 

(M) For all0<k<n, e L 1 (X^ 1 , cf> k ) and L k (-,X) G L^X,^). 

Proposition 2.1. Assume (Al~4) and that the initial sample (£o) w o)£=i is con- 
sistent for (4>o, L 1 (X, 0o))- Then, for all 1 < k < n, each sample (Co-fcifc: 
produced by Algorithm 1 is consistent for ((f>k, L 1 (X fe+1 , 4>k))- The same is true 
when the multinomial selection schedule (2.12) is replaced by deterministic plus 
residual multinomial selection (2.13). 

The proof of Proposition 2.1 is postponed to Appendix A.l. 
2.3. Fixed-lag smoothing 

Unfortunately, it has been observed by several authors that using standard SMC 
methods in the smoothing mode may be unreliable for larger observation sample 
sizes n, since resampling systematically the particles degenerates the particle 
paths. Indeed, when k n, most (or possibly all) marginal particles (^^1^)^=1 
will coincide, resulting in a significant Monte Carlo error when estimating any 
expectation of Xj~ given Y . n using the produced particles. Especially, returning 
to the problem of estimating the intermediate quantity Q n in (2.2), for any 

type of additive functional t(xQ- n ) d = X)fc=o s k(xk-.k+i), ( s fc)fc=o being a set of 
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functions (cf. the two terms of (2.2)), we may expect that the estimator 

n-1 N 

(^"TE^^+Hn) (2-16) 

fc=0 4=1 

of E[i(Xo :ra )|Yo :ra ] is poor when n is large. To compensate for this degeneracy the 
particle sample size N has to be increased drastically, yielding a computationally 
inefficient algorithm. 

On the other hand, since we may expect that remote observations are only 
weakly dependent, it should hold that, for a large enough integer A„, 

E[s k {X k:k+1 )\Y 0:n ]^E[s k (X k:k+1 )\Y 0:k{An) ] , 
where k(A n ) = f min{&; + A n , n}, yielding 

71—1 71—1 

E[t(X 0:n )\Y : n } = J2 E ^( x k:k+i)\Y 0:n } Ps J2 E [sk(X k:k+ i)\Y 0:k{An) ] . 

k=0 k=0 

(2.17) 

Thus, as long as the approximation (2.17) is relatively precise for a A„ which is 
smaller than the average particle trajectory collapsing time, i.e. most marginal 
particles (£ k i k r A ))fli ar c different for all fc, we should replace (2.16) by the 
estimator 

n-1 _ 1 N 

:fc+l|fc(A„) 

(2.18) 

k=0 i=l 

The lag-based approximation (2.18) may be computed recursively in a single 
sweep of the data with only limited computer data storage demands, and com- 
puting (2.18) is clearly not more computationally demanding than computing 
(2.16) (having 0(nM) complexity); see Olsson et al. (2008) for details. Finally, 
using (2.18) in conjunction with the kernel Pg for estimating logqe gives us the 
following approximation of the intermediate quantity Q n {9] 9'): 

n-1 _ 1 N 

fe=0 i=l 

where, for (ar, x') £ X 2 , 

1 a 

and 

V e u& (x,x')~P®«(x,x',-). 

In (2.19) we have added 9' as an index to the particles as well as the associated 
weights to indicate that the particle system of the fixed-lag smoother is evolved 
under the dynamics determined by the initial parameter value. 
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2.3.1. Convergence of the intermediate quantity 

Under weak assumptions on the functions ^ k , the kernels L k and P, and the 
local likelihoods functions log <?#(•, Y k ) one may establish the convergence of the 
approximate intermediate quantity (2.19). Thus, define, for a given lag A n and 
parameters (0,9'), the bias 

n— 1 ~ 

b n (A n ,9,e') = J2 / s k {x k .. k+1 ,e)^ k{An) {Ax k .. k+u e') 

k=Q 

~Y1 / sk(x k , k+1 ,e)^ n {dx k:k+1 ,e') (2.20) 

fc=0 

imposed by the fixed lag. We then have the following result, which is the main 
result of this section. 

Theorem 2.1. Assume (Al-3). Let n > 0, (9,6') £ Q 2 , and (A n ,a,a) G N 3 . 
Suppose that (A4-) holds for ty k (-;9'), L k (-;9'), and <p k (-;9') and that the initial 
sample (Q e ,ujq 9 )fl 1 is consistent for (<j>o(-; 9'), L 1 (0o(s 9'), X)). Moreover, as- 
sume that the mappings £o:fc(A„) \ogge(x k ,Y k ), < k < n, and Xo:fc(A„) l— ^ 
/ \v\P d (x k ,x k+ i,dv), < k < n, belong to L 1 (0 fc(An) (-; 9'), X k( - A ^ +1 ). Then, as 
N -> oo ; 

Qn (0, 0') A Qn(9, 9') + b n (A n , 9, 9') , 
where the bias b n is defined in (2.20). 
The proof is given in Appendix A. 2. 

The bias term b n , which was treated by Olsson et al. (2008), is controlled by 
the speed with which the hidden chain (X k ) k >o forgets its initial distribution 
when evolving conditionally on the observations. Indeed, when the state space 
X is compact it can be shown (see Olsson et al., 2008, for details) that b n is 
0(np An ), where < p < 1 is the uniform (with respect to observation records 
Yo.n as well as initial distributions x) mixing coefficient of the conditional chain. 
From this we deduce that the lag A n should be increased with n at the mini- 
mum rate clogn,, c > — 1/logp in order to keep the bias suppressed. Increasing 
A n faster eliminates the bias and increases the variance of the approximation; 
see again Olsson et al. (2008) for a detailed study of these issues. Since a sim- 
ilar forgetting property holds also in the case of a non-compact state space X 
(Douc et al., 2009a), the same arguments can be applied for very general mod- 
els; however, the analysis of the general case is significantly more involved, since 
the mixing coefficient is neither uniform with respect to observation records nor 
initial distributions x m this case. 

Remarkably, the convergence result in Theorem 2.1 holds for any fixed sample 
sizes (a, a). In particular, nothing prevents us from letting a = a = 1, yielding 
a computationally very efficient algorithm; this is the choice of Section 3. 
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2-4- Forward- filtering backward- smoothing 

Even though naive SMC implementations generally fail to estimate joint smooth- 
ing distributions efficiently, they can, as discussed above, be successfully used 
for estimating the marginal filter distributions (corresponding to k = n in the 
discussion of Section 2.3). Nevertheless, any joint smoothing distribution may 
be expressed in terms of marginal filter distributions via the so-called forward- 
filtering backward-smoothing decomposition. Indeed, for any probability measure 
r) on (X, X), define the reverse kernel 

<— , i , def f a Qe(x,x') ri(dx) 
J qe{x,x')T](dx) 

where A £ X and x' £ X. The definition (2.21) is valid only when x' belongs 
to the subset of X where the denominator is nonzero; outside this set we may 
let Q v take arbitrary values. It can now be shown that (see e.g. Cappe et al., 
2005, Corollary 3.3.8) 

P P n—1 

ct> n {A;6) = / •■• / (f> n \ n (dx n ;6) [J ^^ k (x k+ i, dx k ; 6) , (2.22) 

for A £ Using the Markovian structure of the decomposition above, a 

trajectory Xo-. n can be simulated from </>„(•; 9) by, firstly, computing recursively 
(via (2.4)) the filter distributions {<f) k \ k (-; ^))I-=o an dj secondly, simulating X n 
from <j) n \ n (-; 9) and hereafter, recursively for k = n — 1, n — 1, . . . , 0, X k from 

Q<t> klk (Xk+i, •; 9). This scheme will in the following be referred to as forward- 
filtering backward- simulation (FFBS), and we refer again to Cappe et al. (2005) 
for a detailed treatment. 

In general we lack closed- form expressions of the filter distributions, but 
may estimate these efficiently using Algorithm 1 . Hence, following Doucet et al. 
(2000), a non-degenerate particle estimate of 4>o: n {-',9) can be obtained by re- 
placing, in the decomposition (2.22), <j) n \ n by the empirical measure 4>^ n and 

the reverse kernels Q^ k , k (x k+ i, dx k ] 9) by 



<- 



N 



■ A' k 



(x k+u dx k ;9) = j: - ^ 6, Uk (dx k ) . (2.23) 

i=i L,t=i u k <leK£ k \ k > x k+i) 



Note that a draw according to Q j,n (x k+ i, •; 6) consists of selecting position 

^ k | k 

with probability proportional w k qe((, k \ k , x k+ i)/ Y^i=i w fc9e(Cfc|fcj x k +i)- In 
the case of PODs, a closed-form expression of qg is in general missing, and we 
thus replace each number qg(^ k , x k +i) by a draw Vg{C k \ k i ^fc+i) from the GPE 
Pe{£, k \ k i x k+i, ■)• This gives us the following algorithm for simulating a trajectory 
Xo-.n that is approximately distributed according to 4> n . 
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Algorithm 2 

(* GPE-based particle FFBS *) 
Input: (RkYkZo 

1. run Algorithm 1 to obtain (0ju fe (-; 6>))£ =0 ; 

2. simulate X n ~ <^ n (-; 0); 

3. for k <— n — 1 to 

4. for i <- 1 to AT 

5- simulate V e {^ k ,X k+1 ) ~ P e (^ |fc , X fc+ i, •); 

6. simulate i h ~ (w^*,*, X fc+1 )/ E^i ^fc^(^| fc , **))<=i5 

7. set X k <- Q fe 

8. return X 0:n = (X , . . . , X n ). 

Algorithm 2 avoids the problem of degeneracy of the genealogical tree with- 
out any implicit assumption on geometrical crgodicity of the conditional hidden 
chain. On the other hand, simulating a single trajectory according to Algo- 
rithm 2 involves O(N) operations, implying an overall computational cost of 
order 0(N 2 ) for producing a sample of size N. Recently, Douc et al. (2009b) 
showed how the overall computational cost of the particle-based FFBS can be 
reduced to O(N) by means of accept-reject-methods; however, it is not straight- 
forward to adapt this approach to our framework, since one for general PODs 
cannot find an upper bound on the transition density of the hidden chain. For 
models with forgetting properties, Algorithm 2 should be outperformed by the 
fixed-lag smoother because of the quadratic complexity of the former scheme 
(see the coming section for examples) ; the FFBS should thus be seen as a generic 
and alternative solution in cases of poor mixing. 

3. Simulation study 

In this section, the proposed methods are illustrated on two simulated examples, 
consisting of noisy observations of the models treated by Beskos et al. (2006) and 
Beskos et al. (2008). In both examples we let, for simplicity, the measurement 
noise variance a e be known and set to 0.1 and assume equidistant measure- 
ments with tk+\ — tf. = 1 for all k > 0. We use consequently a = a = 1. The 
approximate intermediate quantity is maximised using the Nelder-Mead 
simplex algorithm as implemented in MATLAB's fminsearch-command. In or- 
der to obtain convergence of the parameter sequence returned by the Monte 
Carlo EM-algorithm, it is necessary to decrease, at each iteration, the bias of 
the particle approximation by increasing the number of particles with the iter- 
ation index. We thus follow the recommendations of Fort and Moulines (2003) 
and increase the particle sample size as the square root of the iteration number, 
with an initial size of 100 particles. A detailed discussion on the effect of the 
lag size on the quality of the final parameter estimates is given in Olsson et al. 
(2008); thus, we do not repeat this discussion here and stick consequently to 
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the recommendation of increasing the lag logarithmically with the size of the 
observation record. 

3.1. Log-growth model 

In the first example we estimate, from simulated data, the parameters of a 
partially observed version of the log-growth model discussed by Bcskos et al. 
(2006). The model is specified by the following system of equations: 

dX t = KX t (l-X t /A)dt + aX t dW t , 
Y k = X tk + a e ek , 

where (ek)k>o are mutually independent, standard normal-distributed random 
variables. The noise sequence is supposed to be independent also from W . 
Applying Ito's formula to the transformation X t = rj(Xt,o~), with r](x,a) ^ 
— log(x)/er, yields 

dX t = a(X t ) + dW t , (3.2) 

where a(x) = f a/2 — a/ a + k/(ctA) cxp(— ax). Since a is bounded from above, 
we are only required to simulate the minimum of the Brownian path and let 
W~ be a evaluated at this minimum; see Section B for the meaning of W~ . The 
minimum of the Brownian bridge has a known law, and given the minimum, the 
bridge can be constructed retrospectively using Besscl bridges (see Beskos et al. , 
2006). Our aim is to estimate the unknown parameters 8 *== (k, A, a) given a 
record ioiiooo of observations. The observation set was obtained through simu- 
lation under the parameters 9* = (0.1, 1000, 0.1). When computing the approx- 
imate intermediate quantity , the random weight fixed-lag smoother used 
the lag A„ = 40 and the proposal 

R k (x,A) = — I t({x' -kx(1 -x/k)}/{ax};A)dx' , (3.3) 
ox J A 

where t(-;n) denotes the density of the student's i-distribution with n degrees 
of freedom. Further the adjustment multiplier weights are set to 1. The proposal 
(3.3) is obtained by discretising the hidden dynamics using the Euler scheme. 
We set a = a = 1. The EM output is presented in Figure 3.1. 

For comparison, the estimation problem of the log-growth model was also 
solved using the GPE-based particle FFBS in Section 2.4. The setup was the 
same as for the fixed-lag smoother, but due to the significant higher computa- 
tional cost of the FFBS scheme (recall Section 2.4) the number of observations 
was reduced to 100. For the FFBS-based procedure, the GPE needs to be eval- 
uated N + 1 times per particle and time step, i.e., once in the forward filtering 
pass and N times in the backward simulation sweep, compared to only once for 
the fixed-lag smoother. 

The output of the EM learning curves obtained using the GPE-based particle 
FFBS is presented in Figure 3.1. 
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Fig 1. Convergence of A (solid, left y-axis), n (dashed, right y-axis), and a (dotted, right 
y-axis) using the fixed-lag smoother with lag 40. 
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Fig 2. Convergence of A (solid, left y-axis), n (dashed, right y-axis), and a (dotted, right 
y-axis) using the GPE-based particle FFBS on 100 observations. 
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3.2. Genetics diffusion model 

In a second example we estimate, again from simulated data, the parameters 
of a partially observed version of the genetics diffusion model presented in 
Kloeden and Platen (1992) and discussed by Beskos et al. (2008). The model 
is given by 

dV t = (u + vVt)dt + o-Vt(l-Vt)dW t , 
Y t k = V tk + cr e e k , 

where the sequence (ek)k>o is as in the previous example. Applying Ito's formula 
to the transformation X t = rj(V t ,o-), where rj(v,cr) = (log(u) — log(l — v))/a, 
allows for using the GPE for estimating the transition density of the latent 
process. In this case, the drift function a of the transformed process becomes 
more involved than in the previous example, and it is neither bounded from 
above nor below. Thus, we have to draw both W~ and W^f and a Brownian 
bridge {W S Y S=0 such that W~ < a(W s ) < W+ for all < s < t; see Section B 
for a justification of this. For this purpose we apply the method proposed in 
Beskos et al. (2008), which involves sampling first a maximum and a min- 
imum W^, and then a Brownian bridge such that < W s < for all 
< s < t. Since a linear transformation of a Brownian bridge is still a Brownian 
bridge, it suffices to consider the case when the path (W s )l =0 is conditioned 
to start and end in zero. Sampling a lower and upper bound can then be done 
by using rejection sampling in the following way: let {a{)i>o with ag = be an 
increasing sequence and consider the intervals (—ai,ai\. Since the probability 
that a Brownian bridge stays in a specific interval [— K, K] has a known ex- 
pression (having the form of an infinite series), it is possible to calculate the 
probability that it is contained in (— a^a^] but not in (— <Zj_ i, Oi— i]; this means 
that either its maximum is contained in (fli—i, a^] or its minimum is contained in 
(— a.;, — eti_i] or both. Thus, we first propose an interval (aj_i,aj]; given this in- 
terval, we then propose, with probability 1/2, a maximum conditioned to belong 
to (a,_i, Oi], otherwise a minimum in (—a;, — aj_i]. Since the distributions of the 
maximum and minimum are known on closed-form, this is easily done. Next, 
we propose a Brownian bridge by decomposing around the proposed maximum 
(minimum) as in the previous example. The resulting path (W s )l =0 is accepted, 
with a probability depending on the path in question, only if it remains in the 

interval; see Beskos et al. (2008) for details. Finally, we set W± d = a(W^). 

Again we attempt to estimate the unknown parameters 9 d = (/i, v, a) given a 
record Voaooo of observations obtained through simulation under the parameters 
9* = (0.05, 0.1, 1). When computing the approximate intermediate quantity , 
the random weight fixed-lag smoother used the lag A„ = 20. Since the state 
space K(0, 1) is compact, we propose the particles by simply drawing uniforms 
over (0, 1). We set a = a = 1. The EM output in presented in Figure 3.2. 
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Fig 3. Convergence of a (dotted, left y-axis), v (dashed, right y-axis) and fi (dotted, right 
y-axis) . 



4. Conclusion 

Parameter inference in general discretely and partially observed diffusion pro- 
cesses is an inherently difficult problem due to the lack of closed-form transi- 
tion densities of the hidden Markov chain. Assuming the possibility of simulat- 
ing exactly transitions of the latent diffusion process, it is possible to produce 
pointwise and consistent estimates of the likelihood function using the stan- 
dard bootstrap particle filter, in which the particles are assigned importance 
weights determined completely by the known local likelihood function. In such 
a framework, the likelihood surface can be explored using e.g. grid-based meth- 
ods (Olsson and Rydcn, 2008). Ionidcs ct al. (2009) use the bootstrap particle 
filter for computing pointwise approximations of the score function and locate 
the maximum likelihood estimate by means of stochastic approximation. How- 
ever, simulating exactly transitions of a diffusion process is in general infeasible 
and we are most often referred to discretisation-based methods such as the Eulcr 
scheme, imposing a nontrivially controlled bias of the final parameter estimates. 
Moreover, mutating blindly, as in the bootstrap particle filter, the particles 
without incorporating, in the proposal kernel, the information provided by the 
observations will in general lead to serious degeneracy of the particle weights, 
especially for models where the observations are informative. 

Thus, in the present paper we proposed an alternative, EM-based method 
for estimating unknown parameters of PODs. The method combines recent ap- 
proaches for estimating efficiently the joint smoothing distribution in hidden 
Markov models with recently proposed techniques for estimating, without bias, 
transition densities of a large class of diffusion processes via GPEs (Beskos et al., 
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2008). Interestingly, the GPE provides a way of producing unbiased estimates 
of the transition densities simultaneously for all parameter values; this is critical 
when carrying through the maximisation-step of the EM-algorithm. For models 
having forgetting properties, the degeneracy of the particle trajectories can be ef- 
ficiently avoided by means of fixed-lag smoothing (Kitigawa, 1998; Olsson et al., 
2008). The decrease of variance gained by the fixed-lag approximation is ob- 
tained at the cost of a bias; the bias is however easily controlled by increasing 
logarithmically the size of the lag with the size of the observation record, yielding 
an algorithm of 0(N) computational complexity. We provide a detailed study of 
the convergence of the GPE-based particle smoother as well as the full interme- 
diate quantity of EM. The results are obtained under, what we believe, minimal 
assumptions and may, since we analyse separately the GPE-based mutation step 
(Lemma A.l), be extended to any selection schedule for which consistency has 
been established in the literature. In this way, our GPEPS convergence results 
differ significantly from that presented in Fearnhead et al. (2008). In the non- 
ergodic case, we proposed a method for sampling the joint smoothing distribu- 
tion which is based on the forward-filtering backward-smoothing decomposition 
of the same. Basically, the method, which relies on an algorithm proposed by 
Godsill et al. (2004) and analysed further by Douc et al. (2009b), consists of a 
forward-filtering pass followed by a backward-simulation pass where trajecto- 
ries are drawn according to approximations of the backward kernels obtained 
using the particle filter estimates obtained in the forward pass. During the two 
passes we replace, when needed, any evaluation of the diffusion process tran- 
sition density by a draw from the GPE. At the end of the day, we obtain an 
0(N 2 ) algorithm that is significantly more costly than the fixed-lag smoother, 
but which avoids elegantly the problem of degeneracy of the genealogical tree 
of the particles. The methods were successfully demonstrated on two examples. 

There exist alternative techniques, either Monte Carlo-based (see e.g. Pedersen, 
1995) or based on basis expansions (A'it-Sahalia, 2008), for approximating the 
transition density. Nevertheless, none of these approaches produce unbiased esti- 
mates. The former is, while quite general, computationally very demanding and 
the latter is only valid for very short time intervals (recall that the performance 
of the GPE is independent of the size of the time grid). Sometimes more direct 
numerical approaches, such as solving the Fokker-Plank equations or taking the 
Fourier inverse of the characteristic function of the SDE, are possible; however, 
these methods often tend to be computationally expensive. Anyway, the theoret- 
ical results obtained by us presume only unbiascdncss of the transition density 
estimator, and thus other approximation schemes may be applicable within our 
framework. 

Appendix A: Proofs 

The proofs of Proposition 2.1 and Theorem 2.1 rely on recent results on limit 
theorems for weighted samples obtained by Douc and Moulines (2008). Since 
we in this section deal exclusively with asymptotic properties of the sample as 
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the sample size tends to infinity, we let, when not specified differently, the limit 
notation — >• refer to an increasing number N of particles only. In addition, we let 
also the particles and the associated weights be indexed by N for clearness. The 
following kernel notation will be useful in the following: Let \x be a measure on 
(H, B{S.)), f a measurable function on (H, 23(E)), and K a kernel from (E, 23(E)) 
to (E, 23(E)); then we set 



The following definition specifies the structure that we want any class of esti- 
mand functions to have. 

Definition A.l. A setC of measurable functions on E is proper if the following 
holds. 

(i) C is a linear space; that is, if f and g belong to C and (a, ft) € R 2 , then 



(ii) if g £ C and f is measurable with \ f\ < \g\, then f G C; 

(iii) for all c£M, the constant function £ i— > c belongs to C. 

We will frequently make use of the following lemma obtained by Douc and Moulines 
(2008). Let (f2, T, P) be a probability space and (TN.ijfLo, 2V > 1, a triangular 
array of sub-cr-fields of T such that .Fjv,j-i Q ^N,i for all 1 < i < N and N > 1. 
In addition, let (UN,i)iL\; N > 1, be a triangular array of random variables such 
that each t/jv,? is J-Ar^-rneasurablc. 

Theorem A.l (Douc and Moulines (2008)). Assume that K[\Unj\\J-nj-i] < 
oo, W-a.s., for all N > 1 and 1 < j < N . Suppose that 

(i) as X — > oo, 




and 




af + pg E C; 




(A.1) 



(ii) in addition, for all e > 0, 



N 




(A.2) 



as N — > oo. Then 
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A.l. Proof of Proposition 2.1 

Algorithm 1 is conveniently analysed within a more general framework of ran- 
dom weight mutation (RWM). Assume that we are given a S- valued, weighted 
particle sample {^N.i,WN.i)iLi which is consistent for some measure v on £>(H) 
and let L be a finite transition kernel from (E,£>(H)) to (H, £>(E)). We wish 
to transform (£,N,i,^N,i)iLi m to another sample (£,N,i,&N,i)iLi targeting the 
measure 

M= V -^ AeB&, 

by means of the RWM operation described below. The input parameters are: a 
proposal kernel R such that i?(£, •) dominates £(£>') for all £ 6 S, a random 
weight kernel S from (S x 3, B(B x 3)) to B(M+)) targeting dL/di? in the 
sense that, for all (£, £) £ S x H, 

and, finally, a Monte Carlo sample size a £ N. 
Algorithm 3 

(* random weight mutation *) 
Input: (£Ar,»,WAr,i)ili, i?, S, a 

1. for i 1 to JV 

2. do simulate £/v,i ~ R(£,N,i, ■)', 

3. simulate V^ a ^ N<i , £ N<i ) ~ S® a _(6r,i, 6r,i, •); 

5. return (f^, Lb N ,i)f = i- 

The sample (£n,i, &N,i)iLi returned by the algorithm is taken as an approxima- 
tion of fi. In order to evaluate the quality of this sample, define the set 

C d =i f {/£L 1 ( M ,S):L(- ) |/|)£C} ; (A.3) 

then the following result stating consistency for weighted samples produced by 
Algorithm 3 is instrumental when establishing Proposition 2.1. 

Lemma A.l. Assume the weighted sample (^N,i,^N,i)iLi is consistent for (y, C) 
and that the function L(-,3) belongs to C. Then the set C defined in (A.3) and 
the weighted particle sample (£,N,i,&N,i)iLi produced by Algorithm 3 are proper 
resp. (fj,, C)-consistent for any fixed a £ N. 

Proof. Properness of the set C is straightforwardly established: To check Prop- 
erty (i) in Definition A.l, suppose that / and g belong to C and let (a, (3) £ K 2 ; 
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then 



\af(0+j3g(Z)\vS(;Z,dv)R(;dO 

I \f(£)\vS(;idv)R(;d£) 



\g(i)\vS{;ldv)R(;d£) 

= \a\L(;\f\) + \P\L(;\g\) , 



where the function on the right hand side belongs to C by construction of C and 
the fact that C is a linear space. That the integral on the left hand side belongs 
to C is now a consequence of Property (ii) in Definition A.l. Properties (ii) and 
(iii) are checked in a similar manner. 

To establish Condition (2.14) in Definition 2.1 it is enough to show that, for 
all / 6 C, 



TV 



(A.4) 



indeed, since C contains the unity mapping £ M- 1 (as C is proper), (A.4) implies 
that 

TV 

SJ^^A.LtS), (A.5) 

from which Condition (2.14) in Definition 2.1 follows by Slutsky's lemma. Thus, 

we define the triangular array UN,i = f UN.if(£,N,i)/QN, N > 1, 1 < i < N, 

and sub-er-ficlds J 7 ^ d = <t{(£,na, UN,i)iLi}, N >1. We then get, by applying the 
tower property of conditional expectations and the consistency of the ancestor 
sample, 



TV 



i=l 

TV 

= fi^ 1 ^ w TV,iE 



"TVJ 
TV 



E 









6v,i, J~N 







1=1 

Ojv'E^ f f(£) f vS^ N , u ldv)R^ N , u di) 
i=i J "* 

TV 

= ^TV / ,UN,iL(tN,i, f) — > vL(f) , 



since L(-, /) < L(-, \ f\) G C. To show that Yj%=i Un,% tends to J2"=i ^[UnaIF 
in probability, implying (A.4), we apply Theorem A.l. In order to establish 



-TV 



TV J 
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the first condition of that theorem we reuse the arguments above and use that 
L(-, |/|) e C, yielding the limit 



N 



^2E[\U N ,i\\T N ]-^uLQf\ 



Now, since convergence in probability implies tightness, we conclude that Con- 
dition (i) in Theorem A.l is fulfilled. 

To verify (ii), define, for some e > 0, An d = E[|C/at,*| ; Wn,i\ > el-^jv]- 

Since, as the ancestor sample is assumed to be consistent, maxi<i<jv ujn,i/Qn 
vanishes in probability as N tends to infinity, the same holds for the product 
A n ^{C max.i< i<N u)N,i > c&n}, where C > is an arbitrary constant. On the 
other hand, 



AnI < C max ujn i < 

Ki<N 



N 

i=l 



\U N ,i\; \f(tN,i)\ ^Vfe,^) > aC 



N 



= n N 1 Y,^ f 1/(1)1 / 



VlS 9a (Z N ,i,£,dVl: a )R(ZN,i,d£) 



i= i » 1/(01 E?=i 

Now, since, for all £ £ S, 

1/(01 / v^&ldvx.^R&dOKL&lfl) , 

• / l/(l)IE?=i^>«c 

where L(-, |/|) £ C, we conclude, using Property (ii) of Definition A.l, that the 
mapping 

^ I |/(0I / vrS^&ldv^R&dt) 

J •'l/(0IE2=i«<> aC ' 

on S belongs to C as well. Thus, consistency of the ancestor sample implies that 

N 



i=l 



|C^,i|;|/(6r,0lE^^.«'^.*) ^ aC 



1/(01/ v 1 S^ a ^^dvi: a )R(C,dO^) ■ (A.6) 

1/(01 E?=i*<>«c 

In addition, since the constant C may be chosen arbitrarily large, the limit 
(A.6) can be made arbitrarily small by the dominated convergence theorem. We 
hence conclude that An tends to zero in probability as TV tends to infinity. This 
establishes (A. 4). 
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In order to establish (2.15) it is, by Slutsky's theorem and (A. 5), enough to 
prove that 



Cl N max ujns — > 



N l<i<N 

Thus, take again a constant C > and write 



(A.7) 



fl^ max WAT.il \ V V e {^ N , i ,i N , l ) > aC } 

l<i<N z — J 

— u=i ) 

N ( a ~j 

<n„ 1 Y,u*A\^ v \&,i>ZN,i)ZaC\ . (A.8) 
i=l U=l J 

To prove that the right hand side of (A.8) converges, we introduce the triangular 

array U N ,i = w*,il{E2U ^(6v,i, £v,i) > aC}/n N , N > 1, 1 < * < N, and let 
the sub-er-ficlds .Fat, iV > 1, be defined as above. Next, we use again Theorem 
A.l. To verify the first condition, take conditional expectation with respect to 
J-n and reuse (A. 6) with / being the unity function; this yields 

AT 



^2E[U NA \T N ] 



i=l 



J2e=i v i>aC 



?;iS 8Q (^,d Ul:Q )i?(£,d£Md£) 



implying (i). To verify (ii), take an e > and define An = f J2iLi E[|i7jv,i | ; \Un,%\ > 
el^jvl. Then 



A' 



1=1 

implying that, for an arbitrary constant C > 0, following the lines of (A. 6) 
A N 1 \c" 



N 



max ujm i < eQpj 
Ki<N ' ~ 



AT 



< 



i=l 



V^N^tNjy^V'fa^iifj) > a{CVC') 



1=1 



N 



E? = i»*>a(cvc) 



ViS^foldvi^Rfodi)^) . (A.9) 



On the other hand, 

fi^ 1 max WAr.il < ^ V e (£ Nii ,£ NA ) < aC > < max u Ni 

Ki<N t-^ Ki<N 



Thus, since the limit (A.9) can be made arbitrarily small by increasing C , we 
conclude that An tends to zero as N tends to infinity. This in turn implies that 
the upper bound in (A.8) tends to 



«iS® a (£,|,dui :Q )i?(e,dOi/(dO 



(A.10) 
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Finally, we complete the proof by noting that (A. 10) can be made arbitrarily 
small by increasing C. □ 

We now use Lemma A.l to prove consistency of Monte Carlo estimates pro- 
duced by the GPEPS. For this purpose, let C . k \ k = £o*fc| fc , 1 — * — ^> denote the 
selected particles obtained in Step (2) of Algorithm 1. Consequently, the sample 
(£o-fc|fc)fci is obtained by resampling the ancestor particles (Co-felfc)i^i mvutino- 

mially with respect to the normalised adjusted weights (w^^/ X^=i ^fe^D^i- 
This operation will in the following be referred to as selection. Using this nota- 
tion and terminology it is now possible to describe one iteration of the GPEPS 
by the following three transformations: 

(M A \N I: Weighting i N 



II: Selection . — ^ -i\N III: Mutation . ^ i \N 

ls0:fc|fc> lji=l _ (.S0:fc+l|fc+lJ w fc+l)i=l 



Here the third operation refers to the random weight mutation procedure de- 
scribed in Algorithm 3. 

To prove Proposition 2.1 we proceed by induction and assume that (£J . k , k , uj z k )fL 1 

is consistent for L 1 (X fc+1 , <pk))- Next, we show how consistency is preserved 
through one iteration of the algorithm by analysing separately Steps (I-III). 
Step I. Define the modulated smoothing measure 



<Pk{v k ) 

then the weighting operation in Step I can be viewed as a transformation ac- 
cording Algorithm 3 with 3 = X" +1 , S = X™ +1 , and 



R(x :k,A) = S Xo . k {A) , 
L(xo-.k, A) = %k(x :k)S X0:k (A) , 
S(x 0:k ,x' . k ,A) = 8y k ( x > o: j(A) . 

Thus, by applying Lemma A.l we conclude that (^o-fc|fc ' '^fc aJ A;)i^=i 1S consistent 
for (j>k(^k) an d the (proper) set 

{/ e L 1 (0fc(*fc),X n+1 ) : tt fc |/| e L 1 ((/) fc , X" +1 )} = L 1 (<^fe(*fc),X n+1 ) . 

Step II. Applying Theorem 3 in Douc and Moulines (2008) gives immedi- 
ately that (£o:fc| fc5 is consistent for [4> k {^k), L 1 (0 fe (# fc ),X re+1 )] for both the 
selection schedules (2.12) and (2.13). 
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Step III. Also the third step is handled using Lemma A.l. In this case, we 
set S = X n+1 , S = X"+ 2 , and 

'v = (pki^k) , 
(j, = (f> k +i , 

( R(x 0:k ,A) = f A 8 XO:k (dx' 0:k ) R k (x' k ,dx' k+1 ) , 
L(x 0:k ,A) = J A ^ k {x^ k+1 ) S X0:k (dx' . k ) R k (x' k ,dx' k+1 ) , 
S(xo:k,x' . k+1 , A) 

= /iA{w3K +1 ,n+i)/[*fcK fe )r- fe (4,4 +1 )]}p(4,4 +1 ,dt;) , 

where P is the GPE described in Section 2.1 (and in more detail in Appendix B). 
Thus, using Lemma A.l yields that (£o-fc+i|fc+i' is consistent for <j)k+i 

and the set 

{/ 6 LVfc+i,X fc+2 ) : L{>, |/|) G L 1 ^^),^ 1 )} = L 1 (^fc + i,X fc+2 ) . 

Finally, we complete the proof by noting that the induction hypothesis is fulfilled 
for k = by assumption. 



A. 2. Proof of Theorem 2.1 

Decompose the error according to 

n-1 

= E 



k=0 



I c,N,9' \ ST^ ifi' a ( A.8' n 
\ ll k(A n )) Z^ W fc(A„) S fc ^fe:fc+l|/c(A„)' U 
i=l 

Sk{Xk:k+l]9) 0fe(A„) {dx k:k +i]9') 



+ b n (A n ,e,e') , (A.n) 

where the bracket terms are errors originating from the GPEPS and the second 
term b n , defined in (2.20), is the cost of introducing the fixed lag. By combining 
Proposition 2.1 with Slutsky's theorem we conclude that 

n , N 



E ( n k(A n ) ) J2 A„ ) lo S 96 (C k fk(A n ) > Y k) 

fe=0 4=1 

n « 

^ / log.9e <^(A„)(dx fc ;#') , (A.12) 



fe=0 ' 



as £o:fc(A„) i ^ \ogge(x k ,Y k ) belongs to L 1 (0 fe(Aii) (•; 6'), X k( - A ^ +1 ) by assump- 
tion. Thus, the second term of the intermediate quantity estimator (2.19) is 
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consistent. In order to establish consistency of the complete estimator it re- 
mains to prove that 



ri—i 
k=0 



N,e' 

k(A n 



l|fc(A„ 



E w fc(A„) E v e {fk 6 k+i\ 

— > ^ / log<?e (x k ,x k +i) 0fc(A„)(dxfe :fc+ i; 6') . (A.13) 

/, — n J 



fe = ' 



To do this, we define U N>i d = wj;^ ELi V e(€k1k+i\k{A n ))l an k{A n ) and ^ 



def 



TV 



^{(^o'fefA )|fc(A )' w fe(A ))i=i) and appeal to Theorem A.l and Proposition 2.1. 
Since log Zfc+i) < / \v\P e {x k ,x k+1 ,dv) for all x k:k+ i G X 2 , the mapping 

£o:fc(A„) logged, x fc+ i) belongs to L 1 ((/) /! . (An) (-;6''),X fc ( A ") +1 ). Hence, 



J^E [U N ,i \? N ] = (Ojfj E W^ An) log ?fl (efc : fc + l| fc( A„ 

i=i 

logge(xfc,a;fc+i)^ fc (A n )(<ia;fc:fc+i;0 / ) , (A.14) 



from which we conclude that (A.13) may be established by verifying the two 
assumptions of Theorem A.l. Following (A.14) and using again that £o:fc(A„) i— > 
J \v\Pg(xk, Xk+i, dv) belongs to L 1 ^^,,) (•; 6'), X fc ' A "' +1 ) by assumption, we 
conclude that 



N 



E E [\U Nti \ |J"jv] II M p e(x k ,x k+ i 7 dv) fe (A„)( da; fe:fc+i; ) 

which verifies Assumption (i) (by tightness of sequences converging in probabil- 
ity). To verify (ii), let e > and set A N d = E W^,i\\ I&nA > e|-^jv]. Then, 
for any constant C > 0, by consistency of the particle sample, 



AnI < C max w?', A \ > ef2 



On the other hand, 



Ki<N 



iN.e' 



fc(A„) ^ " fc(A„) 



. 



(A.15) 



A N 1 { Cmax N U; k{An) < efi fc(A|i) 



N 

<E E 



|fe(A„) 







> Ca 





-l N r 

E<a„) / 



|«i| P e 0Q (a;fc,Xfc + i,dvi :g ) . 
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Now, since, for all Xk-.k+i £ X 2 , 

|t>i| Pf a (x k ,x k+ i,dv 1:& ) < I \v\P s (xk,x k+ i,dv) , 



I 12°=i v t \>Ca 

we get, using Proposition 2.1, 

_ x N „ 
( n k(A n )) XX'(A„) / |t;i|P® Q (x fc ,x fe+ i,dw 1:a ) 

V ' l= l J\-£f =1 vc\>Ca 

\v 1 \P^ a (Xk,X k+1 ,dVl :S ) 4> k ^ n )(dx k . k+ i\6') ■ (A. 16) 

We now note that the limit in (A. 16) can be made arbitrarily small by in- 
creasing C. This verifies condition (ii) in Theorem A.l, which completes the 
proof of (A. 13). Finally, combining (A. 13) with (A. 12) completes the proof of 
Theorem 2.1. 



Appendix B: More on the GPE 

The outline of this section follows Beskos et al. (2006) and Fearnhead et al. 
(2008), and we limit our scope to the one-dimensional case; multivariate ex- 
tensions are treated by Beskos et al. (2008). Let (C[0, t], C[0, t]) be the measur- 
able space of continuous functions on [0,t] and denote by Bxg the law of X on 
(C[0,t],C[0,i\) for the initial condition X = W = x. Also, let WC' 1 - 1 '' be the 
law, on the same space, of the Brownian bridge process W = (W s )o< s <t starting 
in x at time zero and ending in x' at time t. Similarly, denote by fit, x, x' s the law 
of the diffusion bridge obtained when X is conditioned to start at Xq = Wo = x 
and to finish at X t = x'. Recall the definition (2.1) of a(-,9) and let 

pU 

A(u, 0) d = / a(v, 9) dv 



be any antiderivative of a(-, 9). The role of Assumptions (A1-A3) is to guarantee 
that fii j x q is absolutely continuous with respect to W^^^ ' with Radon- 
Nikodym derivative 

dfix, x , ijQ . . 

= 7 $ ex P ( A (x\ 0) - A(x, 9)- \ f\a 2 + a')(w s ,0) ds) , (B.l) 

qg{x,x',t) V 2 J J 

where w £ C[0, t] and Aft denotes the density function of the zero mean normal 
distribution with variance t. Now, define, for ufl, the drift functional 

, . def a 2 (u,9) + a'(u,9) 
<p(u,6) = 1(9) , 
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where 1(9) is the lower bound given in Assumption (A3). The transition density 
qg can, using (B.l), be expressed as 

q g (x, x',t)= J\f t (x' - x) exp (A(x', 9) - A(x, 9) - l(9)t) 

4>{w s ,e)ds] W^' x ' x '\dw) , 



(B.2) 



exp 

Accordingly, we wish to calculate expectations of the form 
J exp (- J f(w s )ds^j W^'^idw) . 

Now assume that it is possible to simulate simultaneously a pair (WJ , W~j) of 
random variables and a trajectory (W S Y S=0 such that 

WJ < f(W s ) < W+ , for all s G [0, t] ; 

in practice this will most often be carried through by first simulating a maximum 
and a minimum of the Brownian bridge process W and hereafter interpolating, 
using Bessel bridges, the rest of the bridge conditionally on these. Let re be a 
discrete random variable having, conditionally on W^ 7 probability distribution 

p t (-\Wf). Then it is easily established that the GPE 



(associated with p t ) is an unbiased estimator of (B.2). Here (tpe)e>i are mutually 
independent variables that are uniformly distributed over [0, t] and independent 
of J- t . Note that the distribution p t can be chosen freely, yielding a whole class 
of GPEs, and an optimal choice is discussed by Fearnhead et al. (2008). In all 
applications considered in this paper we will use let re be Poisson-distributed. 
Using the Girsanov theorem, it can be shown that 

logq t (x,x') = -^log(27rf) - (X ~ X) 

+ A(x', 9) - A(x, 9) - l(9)t ~ J (J <l>(Ws,0) ds^j fix, x', t(dw) , (B.3) 

Since the right hand side of (B.l) can be bounded from above and below, 
a rejection sampler producing samples from the diffusion bridge can be con- 
structed. This is possible as the right hand side of (B.l) is proportional to 
the probability that a marked Poisson process on [0,t] x [0,1] with intensity 

r d = sup x {(f>(x); WJ < x < W^} is below the graph s h-> <f>(W a ;9)/r. How- 
ever, while observing the path for all s is impossible, a finite construction can 
be devised by sampling the Brownian bridge at points specified by the marked 
Poisson process; we refer to Beskos et al. (2006) for details. The algorithm is 
described by the following. 
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Algorithm 4 

(* Sampling a skeleton of a diffusion bridge *) 

1. simulate an outcome (xe, V^)?=i °f the- marked Poisson process with inten- 
sity r and K ~ Po(r); 

2. conditional on W 7 ^, simulate (W xe )^ =1 ; 

3. if^(W X4 )/r<^_ 

4. then return (Wxi)i=\ 

5. else go to (1) 



By interpolating the returned skeleton (W xe )g =1 , sam pl cs Win with (VF S )* =0 ~ 
fix, x',t, can be obtained for any < u < t. Given samplcs from the diffusion 
bridge, an unbiased estimator of (B.3) can be straightforwardly constructed in 
the following way. Let i\i ~ Unif(0,f) be independent of Tf Then —t(j>(W^,9) is 
an unbiased estimator of f(f (t>{w s , 9) ds) fix, x' , t(dw) since 



E 



= E 



E 



E / 0(W s ,9)ds 



w s ,9)ds\ Qx,x',t(dw) 



Finally, plugging this estimator into (B.3) yields an unbiased estimator of loggt. 
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